This document explores Tesla’s stock prices from its initial public offering on June 29, 2010, to March 17, 2017. Tesla’s share prices have experienced staggering increases over several years and machine learning algorithms could unveil trends of seasonality or autocorrelated variables. Predictive analysis, and within it, machine learning, can greatly influence investors from the uncertainty of the market. This time-series data aims to address questions in forecasting trends of Tesla’s future stock prices. The future predictions will be compared to present day stock value to determine the accuracy of the algorithms, in addition to a review of the factors that have affected the stock prices to date (i.e. supply and demand, economy, stock splits, etc.).
library(dplyr)
library(lubridate)
library(summarytools)
library(corrplot)
library(tseries)
library(ggplot2)
library(plotly)
library(formattable)
library(dygraphs)
library(hrbrthemes)
tesla <- read.csv("./Tesla.csv")
The data analysis stage first consists of cleaning, and inspecting the data for inconsistencies. Following these steps, the data may undergo transformations and modelling as required. As part of the data preparation stage, the following steps will be taken:
head(tesla)
## Date Open High Low Close Volume Adj.Close
## 1 6/29/2010 19.00 25.00 17.54 23.89 18766300 23.89
## 2 6/30/2010 25.79 30.42 23.30 23.83 17187100 23.83
## 3 7/1/2010 25.00 25.92 20.27 21.96 8218800 21.96
## 4 7/2/2010 23.00 23.10 18.71 19.20 5139800 19.20
## 5 7/6/2010 20.00 20.00 15.83 16.11 6866900 16.11
## 6 7/7/2010 16.40 16.63 14.98 15.80 6921700 15.80
str(tesla)
## 'data.frame': 1692 obs. of 7 variables:
## $ Date : Factor w/ 1692 levels "1/10/2011","1/10/2012",..: 1206 1216 1246 1301 1373 1378 1383 1389 1261 1266 ...
## $ Open : num 19 25.8 25 23 20 ...
## $ High : num 25 30.4 25.9 23.1 20 ...
## $ Low : num 17.5 23.3 20.3 18.7 15.8 ...
## $ Close : num 23.9 23.8 22 19.2 16.1 ...
## $ Volume : int 18766300 17187100 8218800 5139800 6866900 6921700 7711400 4050600 2202500 2680100 ...
## $ Adj.Close: num 23.9 23.8 22 19.2 16.1 ...
tesla$Date <- as.Date(tesla$Date, format = "%m/%d/%Y")
class(tesla$Date)
## [1] "Date"
The ‘Date’ attribute was changed to represent a date type variable.
sum(is.na(tesla))
## [1] 0
There are no missing values.
Next, a correlation plot will determine whether the attributes are correlated and to what degree.
x <- cor(tesla[2:7])
x
## Open High Low Close Volume Adj.Close
## Open 1.0000000 0.9996232 0.9996050 0.9992333 0.4075155 0.9992333
## High 0.9996232 1.0000000 0.9995214 0.9996909 0.4164661 0.9996909
## Low 0.9996050 0.9995214 1.0000000 0.9996561 0.3976155 0.9996561
## Close 0.9992333 0.9996909 0.9996561 1.0000000 0.4069072 1.0000000
## Volume 0.4075155 0.4164661 0.3976155 0.4069072 1.0000000 0.4069072
## Adj.Close 0.9992333 0.9996909 0.9996561 1.0000000 0.4069072 1.0000000
corrplot(x, type = "upper", order = "hclust")
To understand the attributes further, the measures of central tendency will be reviewed.
Descriptive Statistics of Tesla Stock
| Mean | Median | Std.Dev | Max | Min | |
|---|---|---|---|---|---|
| Open | 132.44 | 156.33 | 94.31 | 287.67 | 16.14 |
| High | 134.77 | 162.37 | 95.69 | 291.42 | 16.63 |
| Low | 130.00 | 153.15 | 92.86 | 280.40 | 14.98 |
| Close | 132.43 | 158.16 | 94.31 | 286.04 | 15.80 |
| Volume | 4270740.90 | 3180700.00 | 4295971.35 | 37163900.00 | 118500.00 |
| Adj.Close | 132.43 | 158.16 | 94.31 | 286.04 | 15.80 |
These values allow us to see the range of values that are present in the Tesla stocks over time. Notably, the range of the minimum and maximum stock values is quite large, likely due to trends over several years. Since this is a time-series dataset from the intial public offering, it is unlikely that outliers are present, since the value of the stock has changed drastically over several years. For the purposes of this investigation, the closed stock price (i.e. the value of the stock at the end of the day) will be used. The table below displays the trends of the closed stock prices from 2010-2017.
Closed Tesla Stock Price Statistics
| Year | Min | Max | Average | % Change per Fiscal Year |
|---|---|---|---|---|
| 2010 | 15.80 | 35.47 | 23.34 | 11.47 |
| 2011 | 21.83 | 34.94 | 26.80 | 7.29 |
| 2012 | 22.79 | 38.01 | 31.17 | 20.62 |
| 2013 | 32.91 | 193.37 | 104.40 | 325.42 |
| 2014 | 139.34 | 286.04 | 223.33 | 48.17 |
| 2015 | 185.00 | 282.26 | 230.04 | 9.44 |
| 2016 | 143.67 | 265.42 | 209.77 | -4.35 |
| 2017 | 216.99 | 280.98 | 251.30 | 20.51 |
Next, visualizations will be used to observe trends in the data. In order to ensure that the most accurate forecasts are obtained from the analysis, there are several aspects to consider when working with a time series dataset. The following data exploration will determine:
The first two visualizations will display the closing price of Tesla stock per day.
Histogram of Closing Price
Now that the closing stock prices have been visualized, as shown above, it is important to determine if autocorrelation is present within the data. Autocorrelation refers to the degree of similarity that is present between the data and a lagged version of the past data. In other words, this assessment determine if the data is dependent on its past.
Seasonality
Stationary
Although there are 7 attributes in the dataset, the focus will be on the attributes “Date” and “Close”, since the goal of this project is to forecast the closing stock price of a univariate time series dataset.
ARIMA model